Skip to content

feat: add client conformance tests for SEP-2575#270

Merged
pcarleton merged 2 commits into
modelcontextprotocol:mainfrom
anubhav756:feat/stateless-client-tests
May 19, 2026
Merged

feat: add client conformance tests for SEP-2575#270
pcarleton merged 2 commits into
modelcontextprotocol:mainfrom
anubhav756:feat/stateless-client-tests

Conversation

@anubhav756
Copy link
Copy Markdown
Contributor

@anubhav756 anubhav756 commented May 12, 2026

Adds client-side conformance tests for stateless MCP SEP-2575.

Tests are fully aligned with the official traceability requirements in PR #273 (src/seps/sep-2575.yaml).

Note

This is a companion PR to the server-side tests in #271.


Context

SEP-2575 removes the stateful initialize handshake and requires carrying protocol version, implementation info, and capabilities on a per-request basis in the _meta parameters and HTTP headers.

This PR introduces the request-metadata client-side conformance scenario to verify that client implementations correctly follow these new stateless rules, incorporating all feedback from review #270.


Changes

1. The request-metadata Client Scenario

Adds the new request-metadata scenario to verify per-request version headers, matching _meta payload specifications, and client capability obligations:

  • Optional Capability Conditional Checks:
    • Roots, sampling, and elicitation are optional client capabilities. A specialized client is 100% conformant even if it only implements the tools capability and omits the others.
    • Mock Server (src/scenarios/client/request-metadata.ts):
      • If present: verifies they are formatted as valid JSON objects {} (SUCCESS/FAILURE).
      • If absent: records SKIPPED (vacuously conformant).
  • Targeted Reference Client Assertions:
    • Inside the unit tests, we explicitly assert that the comprehensive reference client (everything-client.ts) declarations resolve to SUCCESS (not SKIPPED). This guarantees that we actively test the happy path of all optional capabilities in our reference implementation while keeping optionals skipped-friendly for specialized SDKs.

2. Simulated Version Negotiation Retry

The mock server simulates a version negotiation flow to assert the ClientRetrySupportedVersion check (severity WARNING / expectedFailureSlugs compatible):

  • On the first request, the mock server deliberately rejects it with 400 Bad Request and UnsupportedProtocolVersionError (code -32001) carrying data: { supported: ['DRAFT-2026-v1'] }.
  • The client must successfully negotiate and retry using the mutually supported version, updating the check status to SUCCESS.

How Has This Been Tested?

All 133 unit tests are fully green, warning-free, and compile cleanly using npm test and npm run check.

E2E Verification Executions:

1. TypeScript Reference Client

npm start -- client --spec-version draft --scenario request-metadata --command "npx tsx examples/clients/typescript/everything-client.ts"

Result: Passed: 7/7, 0 failed, 0 warnings (✅ OVERALL: PASSED)

  • Correctly asserts SUCCESS for all optional capabilities and version negotiation retry checks.

32 Negative Unit Tests (request-metadata.test.ts)

Added extensive negative cases verifying compliance checks correctly trap clients that:

  • Omit the _meta parameter (sep-2575-client-populates-meta $\rightarrow$ FAILURE).
  • Omit the MCP-Protocol-Version HTTP header (sep-2575-http-client-sends-version-header $\rightarrow$ FAILURE).
  • Disagree between headers and _meta.protocolVersion (sep-2575-http-version-header-matches-meta $\rightarrow$ FAILURE).
  • Fail to negotiate or retry on 400 rejections (sep-2575-client-retry-supported-version $\rightarrow$ WARNING).
  • Exit cleanly without looping/hanging on empty version intersections.
  • Send invalid (non-object) capabilities (roots: "string" $\rightarrow$ FAILURE).

Breaking Changes

None.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

None.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 12, 2026

Open in StackBlitz

npx https://pkg.pr.new/@modelcontextprotocol/conformance@270

commit: f560b10

@anubhav756 anubhav756 force-pushed the feat/stateless-client-tests branch from cd07785 to 4592cc7 Compare May 12, 2026 15:08
@anubhav756 anubhav756 changed the title feat: add MVP stateless client conformance tests for SEP-2575 feat: add client conformance tests for SEP-2575 May 13, 2026
Copy link
Copy Markdown
Member

@pcarleton pcarleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this. SEP-2575 is a big one. I ran /new-sep 2575 to build the full requirement-traceability YAML and opened it as #273 so we have a shared reference for both this PR and #271.

Coverage: the YAML has 12 client-side check: rows; this PR currently covers 3 (client-populates-meta, the version-header pair). The 5 stdio-only rows can be a follow-up scenario, but the 3 client-declares-{elicitation,roots,sampling}-capability MUSTs and the client-retry-supported-version SHOULD are HTTP-reachable and would fit nicely here.

Correctness bits:

  • client-consistent-version — I couldn't find spec backing for this. SEP-2575 makes each request self-contained; nothing forbids a client from changing protocolVersion between requests. The flippingVersionClient negative test would fail a conformant client. Suggest dropping.
  • client-cancels-by-notification — spec scopes notifications/cancelled to stdio only ("Streamable HTTP: …No notifications/cancelled message is required or expected"). This scenario is HTTP, so the check can't legitimately fire here.
  • server/discover gap — the example's discover call (everything-client.ts:98) sends no _meta and no MCP-Protocol-Version header, but the harness marks it as passing. client-sends-version-header is guarded by if (currentVersion) and client-populates-meta runs after the discover early-return. I think we want the example to send the header and the check order to catch it if an example doesn't do it.
  • Check IDs — we're shifting to check ids like sep-<NNNN>-<slug> (see sep-2164-error-code in resources.ts); renaming to match #273's slugs will let the traceability tooling line them up automatically.
  • specReferences URLs are empty — #273 has the per-row anchors you can lift.

Copy link
Copy Markdown
Member

@pcarleton pcarleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See above)

@pcarleton
Copy link
Copy Markdown
Member

Pushed the review fixes as a stacked PR you can merge into this branch: anubhav756#1 (one commit per item above, plus the sep-2575.yaml traceability file and a rename to request-metadata).

@anubhav756 anubhav756 force-pushed the feat/stateless-client-tests branch from 18e3ecc to efd4f4f Compare May 15, 2026 20:25
Copy link
Copy Markdown
Member

@pcarleton pcarleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great thanks.

I've been thinking about how we navigate these "stateless" tests that theoretically should be true on every request, not just a standalone scenario. However, that will require us to revisit all the existing tests.

The strategy I think makes the most sense is for us to get our traceability coverage in for the full RC, and then refactor tests / revisit older tests to make them branch-able on version id, and then make sure we still have check coverage.

No action needed on this PR, just wanted to mention that there will likely be a refactor coming in the near future that may change this shape a bit.

Comment thread src/seps/sep-2575.yaml

- text: 'State that needs to span multiple requests (e.g., long-running tasks, application-level handles) MUST be referenced by an explicit identifier the client passes on each request.'
excluded: 'architectural guidance, observable only via subscriptionId/task-id rows already listed'
- text: 'To distinguish notifications belonging to different concurrent subscriptions, clients MUST correlate notifications using the io.modelcontextprotocol/subscriptionId field carried in _meta.'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for future) reading this one again, we may want to have a concurrent subscription test, or a test for what to do if you get back a subscriptionId for something you didn't subscribe to

@pcarleton pcarleton merged commit 5c1b815 into modelcontextprotocol:main May 19, 2026
4 checks passed
pcarleton added a commit that referenced this pull request May 19, 2026
* Add SEP-2350 scope-accumulation check to auth/scope-step-up

SEP-2350 clarifies that on step-up re-authorization, clients SHOULD
compute the union of previously-granted and newly-challenged scopes so
they don't lose permissions for other operations.

- Traceability yaml: 1 client check (two spec sentences merged), 1 server
  check tracked for a future server-side scenario, 1 excluded reword
- ScopeStepUpAuthScenario now challenges with only the missing scope so
  union accumulation is observable on the second authorization request
- New check sep-2350-scope-union-on-reauth (WARNING when the previously
  granted scope is dropped)
- everything-client withOAuthRetry now accumulates prior token scope into
  the re-auth scope (passing example)
- New auth-test-echo-scope negative client + vitest case

* feat: add client conformance tests for SEP-2575 (#270)

* feat: merge latest request-metadata scenario stacked branch

* test: implement optional capability conditional checks and simulated negotiation retry

* fix(sep-2243): collapse check IDs to one-per-requirement (#287)

The sep-2243 scenario check IDs drifted from the requirement-traceability
yaml in #259 — the code emitted one ID per test case while the yaml
declares one per normative requirement, so the IDs no longer matched.

Rename the emitted IDs so a check ID maps to a single MUST/SHOULD and is
emitted once per case (the per-case detail moves to name/description),
matching the repo's existing 'same id, vary status/message' convention:

- mcp-method-header-* / mcp-name-header-*  -> client-includes-standard-headers
- reject-invalid-tool-* / keep-valid-tool  -> client-reject-invalid-tool
- server-accepts-{lower,upper}case-name    -> header-name-case-insensitive
- server reject status checks (mismatch/missing/case x method/name)
                                            -> server-reject-invalid-headers
- their error-code variants                 -> server-reject-error-code

Merge the yaml's server-reject-mismatch + server-reject-status into one
server-reject-invalid-headers MUST (HTTP 400 on a header-validation
failure); keep server-reject-error-code as the SHOULD. Base64/custom-header
rejection checks keep their own param-validation IDs (out of scope here).

Gates and the whitespace-acceptance check keep their scenario-matching
names. No behavior change — only check IDs, descriptions, and the yaml.

* feat: `sdk` subcommand to run local conformance against any SDK ref (#277)

* feat: add sdk subcommand to run conformance against any SDK ref (#250)

* Revise sdk runner: explicit --mode, KNOWN_SDKS-only config, v1/v2 entries

Addresses review feedback on the sdk subcommand:

- Require --mode (client|server) and remove "both". Each invocation now
  tests exactly one side with its own exit code; the old default ran
  client then server but combined exit codes with ||=, which skipped the
  server side entirely whenever the client run failed.
- Resolve build/run config from KNOWN_SDKS + CLI flags only; drop the
  conformance.config.yaml loader (no SDK ships one yet). The Zod schema
  stays as the type for the built-in entries.
- Split the typescript entry by major version: typescript-sdk (v2/main,
  pnpm install + build:all, expected-failures.yaml) and typescript-sdk-v1
  (v1.x, npm ci + build, conformance-baseline.yml). An entry may set
  `repo` (real clone target for an alias) and `defaultRef` (branch used
  when no @ref is given). parseSdkSpec now leaves ref undefined when
  omitted so defaultRef can apply; a trailing @ is treated as no ref.
- Key the clone cache by ref (<repo>/<ref>) so different refs of the same
  repo no longer share one checkout.
- Bound the server readiness probe with a per-request AbortSignal timeout
  so a server that accepts the socket but never responds can't hang past
  the overall deadline.

* fix(sdk-runner): resolve -o to absolute path; add --expected-failures override; replaceAll for safeName

* feat: add SEP conformance traceability manifest (#288)

squashed; see PR #288 body

* feat: add server conformance tests for SEP-2575 (#271)

* feat: add server conformance tests for SEP-2575

* refactor and add new checks

* Fix stateless scenario source field and report SKIPPED for inapplicable capability checks

- Add the required `source` field (DRAFT) so the scenario typechecks and is
  selectable via --spec-version. Without it the class did not satisfy the
  ClientScenario interface and s.source was dereferenced by the spec-version
  filter at list time.
- Teach runCheck about a SKIPPED status and use it for the two client-capability
  checks when the server does not return -32003, instead of reporting a green
  PASS for a requirement that was never exercised.

* Remove unimplemented placeholder checks from stateless server scenario

The subscriptions/listen, statelessness-invariant, list-changed and
disconnect-is-cancel checks emitted SUCCESS without probing the server,
which inflated coverage in the traceability report. Drop them until they
have real assertions; the corresponding rows remain declared in
src/seps/sep-2575.yaml so traceability shows them as not-yet-covered.

Also fix the checkErrorId helper to push under the
sep-2575-http-server-error-jsonrpc-id slug so its failure path actually
short-circuits the aggregate SUCCESS check at the end of the scenario.

---------

Co-authored-by: Paul Carleton <paulc@anthropic.com>

* SEP-2352: authorization-server migration scenario (#286)

* Add SEP-2352 authorization-server migration scenario

SEP-2352 requires that client credentials are bound to the issuing
authorization server: when PRM authorization_servers changes to a new
issuer, clients MUST re-register and MUST NOT reuse the previous AS's
client credentials.

- Traceability yaml: 3 checks, 3 excluded (internal state / UI)
- New auth/authorization-server-migration scenario (draft suite): two
  auth servers; PRM flips from AS1 to AS2 after the first authenticated
  request; AS2 asserts it received a fresh /register and never saw AS1's
  client_id at /authorize or /token
- ConformanceOAuthProvider gains invalidateCredentials and bindIssuer so
  the everything-client can key credentials by issuer (passing example)
- everything-client adds an issuer-aware handler for this scenario that
  re-reads PRM on each 401 and rebinds before re-authorizing
- auth-test-reuse-credentials negative client + vitest case

* Drop application_type from DCR metadata (not in SDK OAuthClientMetadata type)

---------

Co-authored-by: Paul Carleton <paulc@anthropic.com>

* ci(traceability): enable corepack so the SDK build can use pnpm (#289)

The refresh run failed with 'pnpm: not found' — the reference SDK's
build command (typescript-sdk: pnpm install && pnpm run build:all) needs
pnpm on PATH. Add 'corepack enable' to the run job.

* ci(traceability): tolerate SDK conformance failures in the run step (#290)

The `sdk` command exits non-zero when the SDK has conformance failures
not in its baseline. The traceability manifest only needs the emitted
check IDs (written regardless of pass/fail), so a failing SDK must not
fail the refresh. Add `|| true`; the existing 'no results produced'
guard catches a genuinely broken run.

* chore: refresh SEP traceability manifest (typescript-sdk@main) (#291)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Make scopesSupported disjoint from challenge scopes so the SEP-2350 union check can't be satisfied by scopes_supported ∪ challenge

A client that unions scopes_supported with the challenge — instead of its
prior grant with the challenge — would have requested mcp:basic mcp:write
and falsely passed sep-2350-scope-union-on-reauth, since scopesSupported
happened to coincide with the previously-granted scope. Setting it to a
disjoint value makes the check actually require the prior token's scope.

Spec backing: clients MUST NOT assume any set relationship between the
challenged scope set and scopes_supported, so a disjoint advertisement is
realistic.

* SEP-837: application_type check in DCR registration (#284)

* Add SEP-837 application_type check to DCR registration

SEP-837 requires MCP clients to specify an appropriate application_type
during Dynamic Client Registration so OIDC authorization servers can
apply the correct redirect-URI constraints.

- Traceability yaml: 1 check (presence + valid value), 4 excluded
  (class-specific SHOULDs unobservable; UI/robustness)
- Check added in the shared createAuthServer DCR handler so it fires in
  every auth scenario that performs DCR (no new scenario)
- withOAuthRetry now sets application_type: native (passing example;
  the conformance example clients are CLI tools)
- New auth-test-no-application-type negative client + vitest case

* Widen ConformanceOAuthProvider metadata type for application_type

Fixes CI typecheck (TS2353 at withOAuthRetry.ts:76): the SDK's
OAuthClientMetadataSchema doesn't include application_type yet.
registerClient() spreads clientMetadata verbatim into the /register
POST body, so a local type intersection is sufficient to get the
field on the wire without an SDK release.

* Set application_type in runAuthMigrationClient

The SEP-2352 authorization-server-migration handler constructs its own
ConformanceOAuthProvider (not via withOAuthRetry), so it was missing
application_type after rebasing onto main. The scenario does DCR twice
(old AS, new AS) and the SEP-837 check fires on each.

---------

Co-authored-by: Paul Carleton <paulc@anthropic.com>

* feat: add conformance tests for iss parameter (SEP-2468) (#220)

* feat: add conformance tests for iss parameter (SEP-2468)

Adds 5 draft conformance scenarios testing RFC 9207 issuer parameter
validation in OAuth authorization responses:

- auth/iss-supported: server advertises support and sends correct iss
- auth/iss-not-advertised: server omits iss parameter entirely
- auth/iss-supported-missing: client must reject missing iss when required
- auth/iss-wrong-issuer: client must reject mismatched iss value
- auth/iss-unexpected: client must reject iss when not advertised

Also adds auth-test-iss-validation.ts, a reference client that correctly
validates iss per RFC 9207, and negative tests confirming the standard
client fails all three rejection scenarios.

TODO: Update RFC_9207_ISS_PARAMETER spec reference once SEP-2468
(modelcontextprotocol/modelcontextprotocol#2468) is merged.

* update scenarios

* fix: createAuthServer iss option type/guard and NotAdvertised scenario duplication

The doc comments said 'Default: not included' but the destructure defaulted
to true/'correct', and the `!== undefined` guard at L155 was unreachable —
so there was no way to omit the metadata field, and IssParameterNotAdvertised
silently advertised support (a duplicate of IssParameterSupported).

Kept the on-by-default behavior (mock AS models a well-behaved server) but
made issParameterSupported `boolean | null` so callers pass null to omit,
matching the codeChallengeMethodsSupported pattern. Doc comments now match.
Scenarios that need omission pass null/'omit' explicitly.

* fix: rejection scenarios silently pass when client never reaches auth endpoint

correctlyRejected = !tokenRequestMade reports SUCCESS if the client errors
out before hitting /authorize. Gate on authReached so a setup failure shows
as FAILURE with authReached:false in details.

* fix: iss-unexpected scenario contradicts SEP-2468 spec table row 3

The spec table says: supported=false/absent + iss present -> *Compare* to
the recorded issuer (not reject). The scenario sent a *correct* iss and
FAILed compliant clients for proceeding after a successful comparison.

Now sends a mismatched iss so the comparison fails and rejection is the
spec-required outcome. Reference client updated to compare-when-present
instead of throw-on-presence.

* refactor: replace harness-config checks with client-proceeded checks

iss-advertised-in-metadata / iss-sent-in-redirect (and the not-* variants)
fired in onAuthorizationRequest before the redirect happened, asserting only
that the harness was configured correctly — a client that ignores iss passes
identically. Replaced with one check per scenario keyed on tokenRequestMade,
which observes that the client actually proceeded through the iss path.

* refactor: rename check IDs to sep-2468-* and align with spec table rows

One ID per spec table row; auth/iss-supported and auth/iss-wrong-issuer
both emit sep-2468-client-compare-iss-supported (same comparison, opposite
input) per the same-slug-for-SUCCESS-and-FAIL convention.

* feat: add sep-2468.yaml requirement traceability

8 check rows (4 client table-row checks, 1 metadata-issuer, 2 AS-side,
1 no-normalization), 1 excluded (error-display is UI-facing). The
record-issuer MUST is merged into the compare-iss-supported row text
since it has no independent wire observation.

* fix: migrate iss scenarios specVersions->source (post-#265)

Replaces `specVersions: ['draft']` with `source: { introducedIn: DRAFT_PROTOCOL_VERSION }`
in the 5 iss-parameter scenarios.

This commit typechecks once the stack is rebased onto main >= #265 (the
ScenarioSource migration). Adding it now so the rebase is mechanical.

* fix: include application_type in iss-validation example DCR (post-#284)

The SEP-837 application_type check now runs in every auth scenario; the
hand-rolled DCR in auth-test-iss-validation.ts was omitting the field.

---------

Co-authored-by: Paul Carleton <paulc@anthropic.com>

---------

Co-authored-by: Anubhav Dhawan <anubhavdhawan@google.com>
Co-authored-by: Paul Carleton <paulcarletonjr@gmail.com>
Co-authored-by: Yuan Teoh <45984206+Yuan325@users.noreply.github.com>
Co-authored-by: Paul Carleton <paulc@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Max Gerber <89937743+max-stytch@users.noreply.github.com>
@anubhav756 anubhav756 deleted the feat/stateless-client-tests branch May 20, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants